Robust Data Augmentation for Neural Machine Translation through EVALNET
نویسندگان
چکیده
Since building Neural Machine Translation (NMT) systems requires a large parallel corpus, various data augmentation techniques have been adopted, especially for low-resource languages. In order to achieve the best performance through augmentation, NMT should be able evaluate quality of augmented data. Several studies addressed weighting assess quality. The basic idea adopted in previous is loss value that system calculates when learning from training weight derived data, simple heuristic rules or neural models, can adjust used next step process. this study, we propose EvalNet, evaluation network, NMT. EvalNet exploits value, cross-attention map, and semantic similarity between as its features. map an encoded representation layers Transformer, which base architecture system. cosine distance two embeddings source sentence target sentence. Owing parallelism combination proved effective features evaluation, besides value. first evaluator network introduces Through experiments, conclude yet beneficial robust outperforms evaluator.
منابع مشابه
Data Augmentation for Low-Resource Neural Machine Translation
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...
متن کاملDynamic Data Selection for Neural Machine Translation
Intelligent selection of training data has proven a successful technique to simultaneously increase training efficiency and translation performance for phrase-based machine translation (PBMT). With the recent increase in popularity of neural machine translation (NMT), we explore in this paper to what extent and how NMT can also benefit from data selection. While state-of-the-art data selection ...
متن کاملImproving Machine Translation through Linked Data
With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translat...
متن کاملPre-Translation for Neural Machine Translation
Recently, the development of neural machine translation (NMT) has significantly improved the translation quality of automatic machine translation. While most sentences are more accurate and fluent than translations by statistical machine translation (SMT)-based systems, in some cases, the NMT system produces translations that have a completely different meaning. This is especially the case when...
متن کاملNeural Name Translation Improves Neural Machine Translation
In order to control computational complexity, neural machine translation (NMT) systems convert all rare words outside the vocabulary into a single unk symbol. Previous solution (Luong et al., 2015) resorts to use multiple numbered unks to learn the correspondence between source and target rare words. However, testing words unseen in the training corpus cannot be handled by this method. And it a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics
سال: 2022
ISSN: ['2227-7390']
DOI: https://doi.org/10.3390/math11010123